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Abstract 

We study the problem of learning the most biased coin among a set of coins by tossing the 
coins adaptively. The goal is to minimize the number of tosses to identify a coin i* such that 
Pr (coin i* is most biased) is at least 1 — S for any given S. Under a particular probabilistic 
model, we give an optimal algorithm, i.e., an algorithm that minimizes the expected number 
of tosses, to learn a most biased coin. The problem is equivalent to finding the best arm in 
the multi-armed bandit problem using adaptive strategies. Dar et al. (2002) [7] and Mannor 
and Tsitsiklis (2004) [T2] show upper and lower bounds matching up to constant factors on 
the number of coin tosses for several underlying settings of the bias probabilities. For a class 
of such settings we bridge the constant factor gap by giving an optimal adaptive strategy - a 
strategy that performs the best possible action under any given history of outcomes. For any 
given history, tossing the coin chosen by our strategy minimizes the expected number of tosses 
needed to learn a most biased coin. To our knowledge, this is the first algorithm that employs 
an optimal adaptive strategy under a Bayesian setting for this problem. 

1 Introduction 

The multi-armed bandit problem is a classical decision-theoretic problem with applications in bioin- 
formatics, medical trials, stochastic algorithms, etc. The input to the problem is a set of arms, 
each associated with an unknown stochastic reward. At each step, an agent chooses an arm and 
receives a reward. The objective is to find a strategy for choosing the arms in order to achieve the 
best expected reward asymptotically. This problem has spawned a rich literature on the tradeoff 
between exploration and exploitation while choosing the arms [H 1 1 1|. I2j 15]. 

The motivation to identify the best bandit arm arises from problems where one would like to 
minimize regret within a fixed budget. In the models considered in Bubeck et al. [5], Audibert 
et al. [1] and Gabillon et al. |10j . the goal is to choose an arm after a finite number of steps to 
minimize regret, where regret is defined to be the difference between the expected reward of the 
chosen arm and the reward of the optimal arm. This line of work suggested that the exploration- 
exploitation tradeoffs for this setting are much different from the setting where the number of steps 
is asymptotic. 

In this work, we focus on the sample complexity of the pure exploration problem. In the pure 
exploration problem, given any 5 > 0, the goal is to identify the arm with maximum expected 
reward with error probability at most 5 while minimizing the total number of steps needed. This 
is also equivalent to the problem of learning the most biased coin among a set of coins by tossing 
them adaptively. Over the past decade, the pure exploration problem has garnered the attention 
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of the learning theory community IT2~1 El El [H [TO] . The problem was introduced as a PAC-style 
learning problem by Dar et al. [7j. Given a collection of n arms, they showed that a total of 
0((n/e 2 ) log(l/(5)) steps is sufficient to identify an arm whose expected reward is at most e away 
from the optimal arm with correctness at least 1 — 5. Mannor and Tsitsiklis |12j showed lower 
bounds matching up to constant factors under various settings of the rewards. We attempt to 
bridge the constant factor gap by addressing the problem from a decision-theoretic perspective. 
Given the history of outcomes, does there exist a strategy to choose an arm so that the expected 
number of steps needed to learn the best arm is minimized? 

Since learning the arm with the maximum expected reward in the multi-armed bandit setting is 
equivalent to learning the most biased coin by tossing them adaptively, we will focus on the latter 
for the rest of the paper. An interesting case of the problem is when the most biased coin and the 
second-most biased coin differ in their probability of heads by at least e. Under this assumption, a 
Chernoff bound leads to a trivial upper bound on the number of tosses in the non-adaptive setting 
- toss each coin (4/e 2 ) log (n/5) times and output the coin with the maximum number of heads 
outcomes. Let pi denote the empirical probability of heads for the ith coin. By the Chernoff bound, 
\Pi —Pi\ < e/2 with probability at least 1 — 8/n. Therefore, by the union bound, it follows that this 
trivial toss-each-coin-/c-times method outputs the most biased coin with probability at least 1 — 5. 

In this work, we give a simple yet optimal strategy for choosing coins to toss in the pure explo- 
ration problem in a particular probabilistic setting. We show that our strategy is "optimal" - it 
performs the best possible action for any given history. Given the current history of outcomes of 
all coins, tossing the coin chosen by our strategy minimizes the expected number of tosses needed 
to learn a most biased coin. Our strategy also improves on the number of coin tosses needed in 
comparison to the simple toss-each-coin-fc-times method. 

Setting. We are given a collection of n coins with pi := Pr (Heads for the ith coin). A coin i is said 
to be heavy iipi = p+e and not-heavy if pi = p — e for some given e G (0, 1/2) and j> G [e, 1 — e]. Each 
coin in the collection is heavy with probability a and not-heavy with probability 1 — a. Similar 
to most learning theory problems, given any 5 > 0, we would like to identify a heavy coin with 
correctness probability at least 1 — 5. The algorithm is allowed to toss coins adaptively. The goal 
is to minimize the expected number of tosses required. 

An adaptive strategy is allowed to choose a coin to toss based on the current history of outcomes 
of the coin tosses. Given the history of outcomes of coin tosses, the cost of an adaptive strategy 
is equal to the expected number of further coin tosses needed to identify a most biased coin with 
correctness probability at least 1 — 5 by following the strategy. An adaptive strategy is said to be 
optimal if it has the minimum cost. Thus, an optimal adaptive strategy minimizes the expected 
number of tosses to identify a heavy coin with error at most 5. 

The collection of coins may not contain a heavy coin if n is smaller than 1/a. So, for the sake 
of simplicity, we assume that n is sufficiently large. 

1.1 Results 

Our main result is an optimal adaptive algorithm for the above setting. 

Theorem 1. Given any 5 > 0, there exists an algorithm A that employs an optimal adaptive 
strategy to identify a heavy-coin with correctness at least 1 — 5 in the above setting. At any step, 
the time taken by A to identify the coin to toss is O(logn). 
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We also quantify the improvement over the non-adaptive toss-each-coin-/c-times method given 
by Chernoff bound. We give an upper bound on the expected number of tosses performed by our 
algorithm. We assume an infinite supply of coins under the same probabilistic setting. Then the 
toss-each-coin-fc-times method is the following: repeatedly toss fresh coins (4/e 2 ) log (1/5) times; 
if the empirical probability of heads of the coin is at least p — e/2, then output that coin and 
terminate. This trivial algorithm will find a heavy coin after examining 1/a coins in expectation. 
Thus, the expected number of tosses performed by this trivial algorithm is 

a) (l) l0 4- 

In contrast, our algorithm requires much smaller number of tosses. 

Theorem 2. Assuming an infinite supply of coins, there exists an algorithm A such that the 
expected number of tosses performed by A to identify a heavy-coin with correctness at least 1 — 5 is 
at most 

16 (\ - a , fil - a)(l - SY 
+ log' 



c2 
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1.2 Algorithm 



Algorithm Likelihood-Toss 

1. Initialize Lj <— 1 for every i£ [n]. 

2. While (Li < (1 - a)(l - S)/aS V i G [n]) 

(a) Toss coin i* such that i* = argmax{Lj : i G ["]}• (Break ties 
arbitrarily) . Let 



1 if outcome is heads, 
if outcome is tails. 



1-6, 



(b) Update Ll ^ Ll ,(p^ (1^) 
3. Output argmax{Lj : i G [n]}. 

Notation. Let q = 1 - p, A H = log ((p + e)/(p - e)), A T = log ((q + e)/(q - e)), 
B = log ((1 — a)(l — 5)/a5). At any stage of the algorithm, let the history of outcomes of a coin 
i be Di = (hi,ti) where hi and ti refer to the number of outcomes that were heads and tails 
respectively. Given the history Di, the likelihood ratio of the coin is defined to be 

Pr (Coin i is heavy|L>j) ( 'p + e\ hi ( q — * 

Li ~ 



Pr (Coin i is not-heavy | Di) \p — e J \q + e 
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2 Preliminaries 



Our proof of optimality is based on an optimal strategy for multitoken Markov games. We now 
formally define the multitoken Markov game and state the optimal strategy that has been studied 
for this game. We use the notation and results from [8]. 

A Markov system S = (V, P, C, s, t) consists of a state space V, a transition probability function 
P : V x V — > [0, 1], a positive real cost C v associated with each state v, a start state s and a target 
state t. Let v (0), v(l), . . . , v(k) denote a set of states taken by following the Markov system for k 
steps. The cost of such a trip on S is the sum X)i=o ^v(i) °f the costs of the exited states. 

Let Si, . . . , S n be n Markov systems, each of which has a token on its starting state. A simple 
multitoken Markov game G = S% o 5*2 o • • • o S n consists of a succession of steps in which we choose 
one of the n tokens, which takes a random step in its system (i.e., according to its Pi). After 
choosing a token i on state u say, we pay the cost Ci(u) associated with the state u of the system 
Si. We terminate as soon as one of the tokens reaches its target state for the first time. A strategy 
denotes the policy employed to pick a token given the state of the n Markov systems. The cost of 
such a game E[G] is the minimum expected cost taken over all possible strategies. The strategy 
that achieves the minimum expected cost is said to be optimal. A strategy is said to be pure if 
the choice of the token at any step is deterministic (entirely determined by the state of all Markov 
systems). 

Theorem 3. J2f Every Markov game has a pure optimal strategy. 

For any strategy ir for a Markov game G, we denote the expected cost incurred by playing ir 
on G by E W [G]. 

The pure optimal strategy in the multitoken Markov game is completely determined by the 
grade 7 of the states of the systems. The grade 7 of a state is defined as follows: Given a Markov 
system S = (V, P,C, s,t) and state u, let S(u) = (V, P, C,u,t) denote the Markov system whose 
starting state is u. Consider the Markov game S g (u) - where at any step of the game one is 
allowed to either play in S(u) or quit. Quitting incurs a cost of g. Playing in S(u) is equivalent 
to taking a step following the Markov system S incurring the cost associated with the state of 
the system. The game stops once the target state is reached or once we quit. The grade "f(u) 
of state u is defined to be the smallest real value g such that there exists an optimal strategy a 
that plays in S(u) in the first step. We note that, by definition, the cost of the game Sy( u \(u) is 
E[5 7(u) (ii)] = 7(u) = E a [S^ u) (u)}. 

Theorem 4. J2j/ Given the states ui, . . . , u n of the Markov systems in the multitoken Markov game 
the unique optimal strategy is to pick the token i such that ~f(ui) is minimal. 

We use the following results from |9j to bound the number of tosses. 

Theorem 5. Let X G [— u, /z] be the random variable that determines the step-sizes of a one 
dimensional random walk with absorbing barriers at —L and W. Let L* = L + v, W* = W + \i 
and <f>(p) := E (p x ). 

1. The function 4>(p) is convex. LfK (X) 7^ ; there exists a unique po € (0, 1) U (1, 00) such that 
4>(p ) = 1. IfE(X) < 0, then p > 1 and ifE(X) > 0, then pa < 1. 

2. 

Pr ( Absorption at W) > 1 ~^ w * ■ 

1 — Po 
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3. IfE(X) < 0, then 



4. IfE(X) > 0, then 



E (Number of steps to absorption) < 



L* 



|E(X)f 



E (Number of steps to absorption) < 



(L + W*) ( l-p%* 



E(X) U- p ^+w ■ 



3 Correctness 

We first argue the correctness of the algorithm. 
Lemma 6. Given the history D-i for a coin i, 

Pr (Coin i is heavy\Di) > 1 — <5 if and only if Li > 



5 ) \ a 

Proof. The lemma is a straightforward application of Bayes' theorem. 

„ , , ^ , Pr (DjlCoin i is heavy) Pr (Coin % is heavy) 
Pr (Coin i is heavy I A) = P^dY 

a(p + e) hi (q - e) k 
a(p + e) hi (q — e) li + (1 — a)(p — e) hi (q + e)*» 
aLi 

aLi + (1 - a) ' 

Thus, it follows that 

Pr (Coin i is heavy|Dj) > 1 — S if and only if Lj > ^ - - ^ . 

□ 

The algorithm computes the likelihood ratio Lj for each coin i based on the history of outcomes 
of the coin. It immediately follows that the Algorithm Likelihood- Toss outputs a coin i* such that 

Pr (Coin i* is heavy) > 1 — 6. 



4 Optimality of the Algorithm 

Consider the log-likelihood of a coin i defined as Xi := log Li. Given the history of a coin, the 
log-likelihood of the coin is determined uniquely. The influence of a toss on the log-likelihood is 
a random step for Xi - if the outcome of the toss is a head, then Xj Xi + Ah and if the 
outcome is a tail, then Xi Xi — At- Thus, the toss outcomes of the coin leads to a 1-dimensional 
random- walk of the log-likelihood function associated with the coin. Further, since we stop tossing 
as soon as the log-likelihood of a coin is greater than B = log (1 — q)(1 — S)/aS, the random- walk 
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has an absorbing barrier at B. We observe that the random walks performed by the n coins are 
independent since each coin being heavy is independent of the rest of the coins. 

Thus, we have n identical Markov systems Si,...,S n each starting in state X; L = 0. Each 
Markov system also has a target state, namely the boundary B. A strategy to pick a coin to toss is 
equivalent to picking a Markov system i € [n]. Each toss outcome is equivalent to the corresponding 
system taking a step following the transition probability and step size of the system. The goal to 
minimize the expected number of tosses is equivalent to minimizing the expected number of steps 
for one of the Markov systems to reach the target state. 

Thus, we are essentially seeking an optimal strategy to play a multitoken Markov game. We 
show that the strategy employed by Algorithm Likelihood- Toss is an optimal strategy to play the 
multitoken Markov game arising in our setting. 

Let the Markov system associated with the one-dimensional random walk of the log-likelihood 
function of the history of the coin be S = (V, P, C, s, t). Here, the state space V consists of every 
possible real value that is at most B. The target state is a special state determined by t = B. 
The starting state is s = 0. Given the current state A, the transition cost incurred is one while 
transition probabilities are defined as follows: 




min{A + Ajj, B} with probability Pr (Heads| X), 
X - A T with probability 1 - Pr (Heads | A) 



where 

Pr (Heads| A) = Pr (Heads|Heavy coin) Pr (Heavy coin| A) 

+ Pr (Heads | Non- heavy coin) Pr (Non-heavy coin) A) 

(p + e)ae x (p — e)(l — a) 
ae x + (1 - a) ae x + (1 - a) ' 

4.1 Proof of Optimality 

We now show that the grade is a monotonically non-increasing function of the log-likelihood. 

Lemma 7. Consider the Markov System S = (V, P, C, s, t) associated with the log-likelihood func- 
tion. Let A, Y G V such that X > Y. Then 7(A) < >y(Y). 

Proof. Let j(Y) = g. Then, by definition of grade, it follows that there exists a pure optimal 
strategy a that chooses to toss the coin in the first step in S g (Y) and E a [S g (Y)] = g. We will 
specify a mixed strategy 7r for S g (X) such that E 7r [5 9 (A)] < g and it chooses to play in the system 
5(A) in the first step. It follows by definition that 7(A) < g. 

The pure strategy a can be expressed by a (possibly infinite) binary decision tree D a as follows: 
Each node u has an associated label l(u) £ R. Each edge has a label from {H,T}. The root node 
v is labeled l(v) = Y . On reaching l(u) < B, if a chooses to play in the system, then u has two 
children - the left and right children ul,ur are labeled 1(ul) = l(u) + A# and 1(ur) = l(u) — At 
respectively. The edges (u,ul), (u,ur) are labeled H and T respectively. On reaching l{u) < B, if 
a decides to quit, then u is a leaf node. Finally, if l(u) > B, then u is a leaf node. We observe that 
since a plays in the system S g (Y) in the first step, the root of D a is not a leaf. 

We obtain a mixed strategy ir for S g (X) by considering the following ternary tree D n derived 
from D a : Each node u in D w has an associated label (lx(u), Iy{u)) 6 R 2 - Each edge in D n has a 



6 




Binary Decision Tree D a Ternary Decision Tree D. 



Figure 1: An example of a strategy a represented as a binary decision tree D a for the Markov game 
S g (Y) where B — 2 Ah < Y < B — Ah; the strategy a is to continue playing in the system S g (Y) 
on reaching states Y and Y + Ah and to quit on reaching states Y — At and Y + Ah — At- The 
corresponding ternary decision tree derived from D a is also shown. 

label from {HH, HT,TT}. There is an onto mapping m(u) from each node u € D n to a node in 
D a . The root node u is labeled (lx(u) = X,ly(u) = Y) and m(u) =Root(L> CT ). For any node u, if 
m(u) is a leaf, then it is a leaf. Let u be a node such that v = m(u) is not a leaf. Let vh and vt 
denote the left and right children of v. Create children uhh,uht,utt as nodes adjacent to edges 
labeled HH, HT, TT respectively. Define the mapping iti(uhh) = vh-, iti>(v>ht) = vt, tii(utt) = vt 
and set 

Ix(uhh) = lx(u) + A H , Ix{uht) = lx{u) + A H , Ix(utt) = lx(u) - A T , 
W{uhh) = W{u) + A H , W{uht) = W(u) - A T , W{utt) = W{u) - A T . 

By construction of D n , it follows that if X > Y, then at any node u in D w , lx(u) > W{u) and 
hence, Pr (Heads \lx{u)) > Pr (Heads|/y (n)). 

Our mixed strategy tt for S g (X) is based on D^. The strategy at any step maintains a pointer 
to some node u in D n . Initialize the pointer to the root node u. If the pointer is at a non-leaf node 
u, then 7r chooses to play in the system. If the step in the system is a backward step (outcome 
of coin toss is a tail), then tt moves the pointer to utt- If the step in the system is a forward 
step (outcome of coin toss is a head), then tt generates a random number r G [0, 1] and moves 
the pointer to the node uhh if r < Pr (Heads|/y («)) /Pr (Heads \lx(u)) and to the node uht if 
r > Pr (Heads | Zy (it)) /Pr (Heads|Zx(«))- If the pointer is at a leaf node u such that Zy(u) < B, then 
7r quits the system. Otherwise, Zy(it) > B and hence lx( u ) > B. Thus, the strategy tt is a valid 
mixed strategy for S g (X) and tt plays in the system S g (X) in the first step. 

It only remains to show that E 7r [ 1 S' s (X)] < g. This is shown in Claim [8j □ 

Claim 8. 

K[S g (X)] < g. 

Proof. The cost of using a for S g (Y) can be simulated by running a random process in D a and 
considering an associated cost. For each non-leaf node in D a associate a cost of 1 and for each leaf 
node it in D a such that l(u) < B, associate a cost of g. Consider the following random process 
RP\: Begin at the root it of D a . Repeatedly traverse the tree D a by taking the left child with 
probability Pr (Heads|Z(n)) and the right child with the remaining probability until a leaf node is 
reached. The cost of the random process is the sum of the cost incurred along the nodes in the 
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path traversed by the random process. Let E[Z? CT ] denote the expected cost. Then, by construction 
of D a , it follows that E[D a ] = E a [S g (Y)} = g. 

Next, we give a random process RP2 on D n that relates the expected cost of following strategy 
tt on S g (X) and the expected cost of following strategy a on S g (Y). We first associate a cost with 
each node u in D^: For each non-leaf node u, if lx(u) < B, then cost cx(u) = 1, and if ly(u) < B, 
then cost cy(u) = 1. For each leaf node u, if lx{u) < B, then cost cx(u) = g and if ly(u) < B, 
then cost cy(ii) = g. The remaining costs are zero. Here, we observe that cx(u) < cy(u) for every 
node u G D n . 

Now, the cost incurred by following strategy tt for S g (X) is the same as the cost incurred by 
the following random process RPi: Begin at the root node and repeatedly traverse the tree D n by 
taking one of the three children at each non-leaf node until a leaf node is reached. On reaching 
a non-leaf node u, traverse to uhh with probability Pr (Heads|Zy (u)), to ujjt with probability 
Pr (Heads\lx(u)) — Pr (Heads|/y (u)) and to utt with the remaining probability. 

Let P be the set of nodes in the path traversed by the random process RP2. Let the cost 
incurred be cx = ^2 u& pCx(u) and cy = Ylu&p cy(it). Due to the construction of D n from D a , it 
follows that the expected cost incurred by RP\ is equal to E[cy]. Hence, E[cy] = E[Z? CT ] = g. Next, 
since cx(u) < cy(u) for every node u, it follows that E[cx] < E[cy] = g. Finally, the expected cost 
incurred by following mixed strategy tt for S g (X) is exactly equal to E[cx]- □ 

Proof of Theorem^ We use Algorithm Likelihood- Toss. By Lemma[6j the optimal adaptive strat- 
egy also minimizes the expected number of tosses to identify a coin i such that the log-likelihood 
X l > B. 

The strategy adopted by Algorithm Likelihood- Toss at any stage is to toss the coin with max- 
imum log-likelihood. Let the Markov system associated with the one-dimensional random walk of 
the log-likelihood function of the history of the coin be S = (V, P, C, s, t). We have n independent 
and identical Markov systems S± = S2 = ■ • • = S n = S associated with the log-likelihood function 
of the respective coin. By Theorem [4j the optimal strategy to minimize the expected number of 
tosses to identify a coin i such that the log-likelihood > B is to toss the coin i such that j(Xi) 
is minimal. Lemma [7] shows that the grade function j(X) is monotonically non-increasing. Thus, 
tossing the coin with maximum log-likelihood is an optimal strategy. 

By the description of the algorithm, it is clear that updating the likelihood values after seeing 
the outcome of a coin toss takes constant time (since it is updated for only one coin). Thus, the 
optimal choice of coin to toss in the next step is determined by the algorithm in time linear in the 
number of coins. We observe that this can in fact be made logarithmic in the number of coins by 
maintaining a sorted ordering of the coins based on their likelihoods. □ 

5 Number of Coin Tosses 

In this section, we give an upper bound on the number of coin tosses performed by Algorithm 
Likelihood- Toss. We assume an infinite supply of coins. Thus, the algorithm repeatedly tosses 
a coin while the log-likelihood of the coin is at least zero and starts with a fresh coin if the log- 
likelihood of the coin is less than zero. The algorithm terminates if the log-likelihood of a coin is 
at least B. 

Consider the random walk of the log-likelihood function. The random walk has absorbing 
barriers at B and at every state less than 0. 
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Lemma 9. Let C and D denote the expected number of tosses to get absorbed for a non-heavy and 
heavy coin respectively. Let ir denote the probability that a heavy coin gets absorbed at B. Then, 



1. 

> A H (p + e)-A T (q-e) 
n ~ ~~ 2{A H + A T ) 

2. 

D ( 8B \ ( A H + A T 



vr - \ A H (p + e) - A T {q-e)J \A H (p + e) 
3. 

c < 2{A H + A T ) 



A T {q + e)-A H {p-e) 



Proof. Consider a modified random walk where the starting state is Ah as opposed to zero. Let 
C and D 1 denote the expected number of tosses for the modified walk to get absorbed using a 
non-heavy and heavy coin respectively. Let n' denote the probability that the modified walk gets 
absorbed at B using a heavy coin. Then, D < D' + 1 < 2D', C < C + 1 < 2C", vr = (p + e)vr'. 

We use Theorem [5] For the modified random walk, we have that L = Ah, W = B — Ah, 
v = Ax, \i = An- For the modified random walk using a heavy coin, the step sizes are 



X 



Ah with probability p + e 
—At with probability q — e, 



and for the modified random walk using a non-heavy coin, the step sizes are 

Y = 

For e > 0, we have that E (Y) < 0. Therefore, 



Ah with probability p — e 
—At with probability q + e, 



c , < + A T 



A T (q + e)- A H (p-e) 



and hence we have the bound on C. 

Now consider the modified random walk using a heavy coin. For e > 0, we have that E (X) > 0. 
Let po < 1 be the unique real value such that E (pq) = 1. Thus, 



1 - o Ah 

7T'> 1 P ° A 



i - p? + Ah 



Epo ^ l-p^ y " 

Since </>(p) is convex, it can be shown that the minimum value of <fi(p) occurs at 

A H {p + e), 
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and hence, po < p m m < 1- Thus, 

D' < (A H + B) ( 1 - ^ +A " 



7T 



E(X) ^1-p^ 
2B (\- p£ H+Ar ' 

"min 




2S 



(for 1? larger than a constant) 

(since p < /? mm ) 
\ 



< 



A H (p + e) 

V 

4B(A H + A T ) 



A H (p+e) 



E (X) A H ' 

and we obtain the bound on the ratio D/ir. Finally, to lower bound 7r', we observe that 



TT' > 



> 



1 - Po Ag 

1 " P t AH 

1 — ■ 

r mm 



1 



B+Aj 



Pmm 

>1- P Ah 

— "ram 



> 



2(A H + A T )(p + e)' 



□ 



Proof of Theorem^ We use Algorithm Likelihood- Toss. Consider the one-dimensional random 
walk of the log-likelihood function. The random walk has absorbing barriers at B and at every 
state less than 0. Let C and D denote the expected number of tosses to get absorbed for a non- 
heavy and heavy coin respectively. Let it denote the probability that a heavy coin gets absorbed 
at B. Let Dq and D\ denote the expected number of tosses of a heavy coin to get absorbed at 
and B respectively. Then, D = (1 — tt)Dq + ixD\. 

Let E denote the expected number of tosses performed by algorithm Likelihood- Toss. Then, 

E < (1 - a){C + E) + a((l - tt)(D + E) + irD x ) 

^E<^° + D . 

a it it 



By Lemma [9j we have that 
4(A H + A r ) 



E < 



A H (p + e) - A T (q - e) 



a 



a 



A 



H 



A 



T 



A T {q + e) - A H (p - e) 



+ 



2B 



A H {p + t) 
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The final upper bound follows by substituting for Ah, At and B and using the following inequalities 
(derived by straightforward calculus), 

2 f A H + A T A h + A T 

> max ■ 



e - \ A H {p + e) - A T (q - e) ' A T (q + e) - A H (p - e) 
e 

Ah > . 

p-e 



□ 
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